Contrastive Vision-Language Pre-training with Limited Resources
نویسندگان
چکیده
Pioneering dual-encoder pre-training works (e.g., CLIP and ALIGN) have revealed the potential of aligning multi-modal representations with contrastive learning. However, these require a tremendous amount data computational resources billion-level web hundreds GPUs), which prevent researchers limited from reproduction further exploration. To this end, we propose stack novel methods, significantly cut down heavy resource dependency allow us to conduct representation alignment resources. Besides, provide reproducible baseline competitive results, namely ZeroVL, only 14M publicly accessible academic datasets 8 V100 GPUs. Additionally, collect 100M for pre-training, achieve comparable or superior results than state-of-the-art proving effectiveness our methods on large-scale data. We hope that work will useful points experience future research in vision-language pre-training. Code is available at https://github.com/zerovl/ZeroVL .
منابع مشابه
Language identification with limited resources
Language identification is an important issue in many speech applications. We address this problem from the point of view of classification of sequences of phonemes, given the assumption that each language has its own phonotactic characteristics. In order to achieve this classification, we have to decode the speech utterances in terms of phonemes. The set of phonemes must be the same for all th...
متن کاملTransliteration Generation and Mining with Limited Training Resources
We present DIRECTL+: an online discriminative sequence prediction model based on many-to-many alignments, which is further augmented by the incorporation of joint n-gram features. Experimental results show improvement over the results achieved by DIRECTL in 2009. We also explore a number of diverse resource-free and language-independent approaches to transliteration mining, which range from sim...
متن کاملTREQ-AL: A word alignment system with limited language resources
We provide a rather informal presentation of a prototype system for word alignment based on our previous translation equivalence approach, discuss the problems encountered in the shared-task on word-aligning of a parallel Romanian-English text, present the preliminary evaluation results and suggest further ways of improving the alignment accuracy.
متن کاملProject selection with limited resources in data envelopment analysis
In this paper allocating a fixed resource for producing finite projects in order to obtaining a desired level of efficiency will be discussed. Note that it is assumed that a vector of limited sources is at hand. This vector of resources can be contained human resource, budget, equipment, and facilities. In any firm there exist different suggestions from subunits for running...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2022
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-20059-5_14